69 research outputs found
CoMaL Tracking: Tracking Points at the Object Boundaries
Traditional point tracking algorithms such as the KLT use local 2D
information aggregation for feature detection and tracking, due to which their
performance degrades at the object boundaries that separate multiple objects.
Recently, CoMaL Features have been proposed that handle such a case. However,
they proposed a simple tracking framework where the points are re-detected in
each frame and matched. This is inefficient and may also lose many points that
are not re-detected in the next frame. We propose a novel tracking algorithm to
accurately and efficiently track CoMaL points. For this, the level line segment
associated with the CoMaL points is matched to MSER segments in the next frame
using shape-based matching and the matches are further filtered using
texture-based matching. Experiments show improvements over a simple
re-detect-and-match framework as well as KLT in terms of speed/accuracy on
different real-world applications, especially at the object boundaries.Comment: 10 pages, 10 figures, to appear in 1st Joint BMTT-PETS Workshop on
Tracking and Surveillance, CVPR 201
An Empirical Evaluation of Visual Question Answering for Novel Objects
We study the problem of answering questions about images in the harder
setting, where the test questions and corresponding images contain novel
objects, which were not queried about in the training data. Such setting is
inevitable in real world-owing to the heavy tailed distribution of the visual
categories, there would be some objects which would not be annotated in the
train set. We show that the performance of two popular existing methods drop
significantly (up to 28%) when evaluated on novel objects cf. known objects. We
propose methods which use large existing external corpora of (i) unlabeled
text, i.e. books, and (ii) images tagged with classes, to achieve novel object
based visual question answering. We do systematic empirical studies, for both
an oracle case where the novel objects are known textually, as well as a fully
automatic case without any explicit knowledge of the novel objects, but with
the minimal assumption that the novel objects are semantically related to the
existing objects in training. The proposed methods for novel object based
visual question answering are modular and can potentially be used with many
visual question answering architectures. We show consistent improvements with
the two popular architectures and give qualitative analysis of the cases where
the model does well and of those where it fails to bring improvements.Comment: 11 pages, 4 figures, accepted in CVPR 2017 (poster
A Generative Model For Zero Shot Learning Using Conditional Variational Autoencoders
Zero shot learning in Image Classification refers to the setting where images
from some novel classes are absent in the training data but other information
such as natural language descriptions or attribute vectors of the classes are
available. This setting is important in the real world since one may not be
able to obtain images of all the possible classes at training. While previous
approaches have tried to model the relationship between the class attribute
space and the image space via some kind of a transfer function in order to
model the image space correspondingly to an unseen class, we take a different
approach and try to generate the samples from the given attributes, using a
conditional variational autoencoder, and use the generated samples for
classification of the unseen classes. By extensive testing on four benchmark
datasets, we show that our model outperforms the state of the art, particularly
in the more realistic generalized setting, where the training classes can also
appear at the test time along with the novel classes
- …